Learn SQL performance best practices to speed up your production database.
Hit the NOS!Relational databases allow you to connect data across tables via enforced relationships, ensuring data accuracy and allowing you to keep queries efficient via small tables. But what happens when you need to look at related data that’s stored in two or more different tables at the same time?
That’s where the SQL JOIN
statement comes in.
In SQL, a JOIN
statement is used to combine data from two or more tables based on a column that the two of them share. For example, if we have a table of users that contains a user ID, and a table of orders that contains the user ID for the user who placed each order, we could join those tables using that shared user ID column.
Joins can be executed in a variety of ways, and you’ll hear them described using terms like “inner” and “outer”, “left” and “right.” We’ll cover all of these in more depth shortly, but in a nutshell:
If that sounds confusing, don’t worry – we’re going to look at all of these in more detail. But first, let’s establish some example tables so that we can look at something tangible rather than discussing joins in the abstract.
Let’s imagine we’re looking at data from an ecommerce website. We’ve got two very simple tables.
users
Our users
table, which we can also think of as the “left” table or table 1 for the purposes of these exercises, describes a few of the website’s users, including their IC, name, and city. It looks like this:
user_id | name | city |
---|---|---|
1 | beverly | copernicus city |
2 | jean-luc | la barre |
3 | geordi | mogadishu |
orders
Our orders
table, which we can also think of as the “right” table or table 2 for the purposes of these exercises, describes orders from the website, including the order id, the user id of the user who placed the order, and a product name. It looks like this:
order_id | user_id | product |
---|---|---|
1001 | 2 | tea (earl grey) |
1002 | 4 | risa gift card |
1003 | 1 | medical tricorder |
Note that our two tables share a related column, user_id
. This is the column that links these two tables, and it is the column that we will use to execute our various joins in this article.
Note: in a real-world database, we’d probably want to use a foreign key linking the orders table’s user_id column to the user_id column in users, to ensure orders couldn’t enter the database without a valid user associated with them. However, for our purposes demonstrating how joins work here, we won’t bother.
If you would like to follow along using a real SQL database, you can copy and paste the following commands to generate these two tables and insert the relevant data. These commands are written in CockroachDB’s SQL syntax, which is very similar to PostgreSQL. You can create a free serverless CockroachDB database here and then follow along using the CockroachDB SQL client or another SQL client of your choice. You can also use these commands with pretty much any other SQL database, although slight adjustments may be required.
CREATE TABLE users (
user_id INT PRIMARY KEY,
name STRING,
city STRING
);
INSERT INTO users (user_id, name, city) VALUES
(1, 'beverly', 'copernicus city'),
(2, 'jean-luc', 'la barre'),
(3, 'geordi', 'mogadishu');
CREATE TABLE orders (
order_id INT PRIMARY KEY,
user_id INT,
product STRING
);
INSERT INTO orders (order_id, user_id, product) VALUES
(1001, 2, 'tea (earl grey)'),
(1002, 4, 'risa gift card'),
(1003, 1, 'medical tricorder');
Now, we’ll take a look at each type of join and demonstrate an example of this type of join using the example tables we just established.
An INNER JOIN
selects for only the values that both tables share.
Let’s look at how this works in practice: imagine we want to look at a table that makes it easier to see which users have purchased which products. We can do a join that returns the user ID, and name from the users
table, and the product name from the orders
table. An inner join will return those values only for rows where the order_id
exists in both the users
and orders
tables.
Here’s the SQL query that will return our desired results:
SELECT users.user_id, users.name, orders.product
FROM users
INNER JOIN orders ON users.user_id = orders.user_id;
If we read this query line by line, we are:
orders
, the “right” table), and the column to join on. In this case, we’re doing an inner join, so the query should only return the id, name, and city for users whose user ID appears in both tables.And here’s the output that query returns when we execute it in the database with our example tables:
user_id | name | product
----------+----------+--------------------
1 | beverly | medical tricorder
2 | jean-luc | tea (earl grey)
(2 rows)
As we can see, the inner join only returned results for the rows with user IDs 1
and 2
, because these are the only user ids that occur in both tables.
A LEFT JOIN
selects for all of the values from the “left” (first) table, and matching values from the right table. Note that LEFT OUTER JOIN
and LEFT JOIN
do the same thing; some SQL databases may require using one or the other. In CockroachDB, either will work.
Let’s look at how that works in practice. We’ll use the same query we just used, except that we’ll specify that we want a LEFT JOIN
rather than INNER JOIN
. This will return results for all of the user IDs that are in the users table, and match each with the products they purchased in the orders table, if one exists. If not, the missing value will be returned as NULL
.
SELECT users.user_id, users.name, orders.product
FROM users
LEFT JOIN orders ON users.user_id = orders.user_id;
Here are the results that query returns when executed against our example tables:
user_id | name | product
----------+----------+--------------------
1 | beverly | medical tricorder
2 | jean-luc | tea (earl grey)
3 | geordi | NULL
(3 rows)
Here, we can see that the query has indeed returned all results from the “left” table (users
) and matched them correctly to the products they purchased. Note that now the row including Geordi has been returned, whereas it wasn’t returned as part of the inner join. Since Geordi doesn’t appear in the orders
table, a NULL
value has been returned for the product
column.
A RIGHT JOIN
selects for all of the values from the “right” (second) table, and matching values from the right table (again, the OUTER
part of the command is optional).
To put that into practice, we’ll use the same query again, making it a RIGHT JOIN
this time:
SELECT users.user_id, users.name, orders.product
FROM users
RIGHT JOIN orders ON users.user_id = orders.user_id;
And here’s what that query returns when executed against our example tables:
user_id | name | product
----------+----------+--------------------
1 | beverly | medical tricorder
2 | jean-luc | tea (earl grey)
NULL | NULL | risa gift card
(3 rows)
We can see that the query has returned all of the specified results from the “right” (orders) table, and filled in the details from the “left” (users) table where available. Note that again, missing values are indicated with NULL
.
A FULL JOIN
returns all of the results from both tables.
Again, to see what this looks like, we’ll use the same query, changing only the type of join we’re executing:
SELECT users.user_id, users.name, orders.product
FROM users
FULL JOIN orders ON users.user_id = orders.user_id;
And here are the results that query returns when executed against our example tables:
user_id | name | product
----------+----------+--------------------
1 | beverly | medical tricorder
2 | jean-luc | tea (earl grey)
3 | geordi | NULL
NULL | NULL | risa gift card
(4 rows)
Here, we can see we have all of the relevant results from both tables. As with the other join types, NULL
values are inserted to represent missing values.
While the four types of joins listed above are most frequently used, there are some other types of joins (and join-like operations) that are worth mentioning.
SQL tables can be joined with themselves using a JOIN
command, and this can be useful in some situations. For example, imagine we have the following directory
table, which lists the people – both teachers and students – in a school directory:
id | name | teacher_id |
---|---|---|
1 | keiko | |
2 | john | 1 |
3 | susan | 1 |
4 | alexander | 1 |
Since this table is very small, it’s not difficult to see that Keiko is the teacher, and John, Susan, and Alexander are her students. But what if this was a real table with thousands of rows, and we wanted to output the names of students with their assigned teacher’s name next to them, rather than having to constantly cross-reference the ID numbers? We can use a self-join for that.
First, here’s the SQL required to create the directory
table and insert the data, for those following along:
CREATE TABLE directory (
id INT PRIMARY KEY,
name STRING,
teacher_id INT NULL
);
INSERT INTO directory (id, name, teacher_id) VALUES
(1, 'keiko', NULL),
(2, 'john', 1),
(3, 'susan', 1),
(4, 'alexander', 1);
Now that we’ve created our table and inserted the data, we can execute our self-join, which just uses the JOIN
command. Note that we’re giving the table two different aliases, s
and t
, so that we can join it with itself. We’ll also set up some aliases here to make the final output a bit easier to read:
SELECT
s.id AS student_id,
s.name AS student_name,
t.name AS teacher_name
FROM directory AS s
JOIN directory AS t ON s.teacher_id = t.id;
And here’s what that query will output:
student_id | student_name | teacher_name
-------------+--------------+---------------
2 | john | keiko
3 | susan | keiko
4 | alexander | keiko
(3 rows)
A CROSS JOIN
is another special type of SQL. In mathematical terms, cross joins return the Cartesian product of the columns you opt to join.
Let’s go back to our original example table to demonstrate how a cross join functions. Note that this is not a great use case for cross joins; we’re just using this to demonstrate how they work.
Here is the query to execute the join:
SELECT users.name, orders.product
FROM users
CROSS JOIN orders;
And here is what that query outputs when executed against our example tables.
name | product
-----------+--------------------
beverly | tea (earl grey)
beverly | risa gift card
beverly | medical tricorder
jean-luc | tea (earl grey)
jean-luc | risa gift card
jean-luc | medical tricorder
geordi | tea (earl grey)
geordi | risa gift card
geordi | medical tricorder
(9 rows)
As we can see, using a cross join gives us all of the possible combinations of the values in users.name
and orders.product
. In the case of these two particular tables that’s not very helpful, but it could be useful in other circumstances.
As a simple example, imagine you have one table with a list of staff members, and another table with a list of tasks. If all staff members need to complete all tasks, a CROSS JOIN
could be used to quickly generate a checklist similar to the list above, listing each staff member with each task they need to complete.
While UNION
is not a join, it is a method for merging data from two distinct tables into a single output, and thus deserves mention here. UNION
is used to return all distinct results from two separate SELECT
statements, provided both return the similar data types, and provided both statements have the same number of columns.
For example, if we wanted to return a list of all of the names and products in our example database, we could use UNION
like so:
SELECT name FROM users
UNION
SELECT product FROM orders;
That query returns the following output:
name
---------------------
beverly
jean-luc
geordi
tea (earl grey)
risa gift card
medical tricorder
(6 rows)
We’ve looked at a variety of different JOIN
commands here, and while an in-depth discussion of SQL performance best practices is outside the scope of this post, it’s worth noting that joins are not always executed in the same way, and a variety of factors including indexes and the join algorithm used can impact join performance, which can in turn impact the performance of your database as whole.
These factors will vary somewhat depending on the specifics of the relational database you’re using.
You can read more about how to make joins highly performance in CockroachDB here, as well as more about join algorithms and how they’re selected. If you’re using a different relational database, refer to its documentation for guidance on how to optimize the performance of your frequently-used joins.
Foreign keys are an important element of any relational database. But when you’re setting up your database schema, it’s …
Read moreOver the last couple weeks, you might have noticed a new tab on the cluster view of CockroachDB Cloud! We just released …
Read moreI periodically need to make changes to tables. Adding columns is very simple with the ALTER TABLE command… But my tables …
Read more